feat: tool_call_alias and depends_on for intra-turn tool scheduling#50
Open
Simon-Free wants to merge 14 commits intoSafeRL-Lab:mainfrom
Open
feat: tool_call_alias and depends_on for intra-turn tool scheduling#50Simon-Free wants to merge 14 commits intoSafeRL-Lab:mainfrom
Simon-Free wants to merge 14 commits intoSafeRL-Lab:mainfrom
Conversation
The three TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields that were removed in 620bbb2 ("fix: remove dead cache_read/cache_creation fields per review"). They have been failing ever since. Delete test_checkpoint_extras.py -- its remaining cases were either trivial (test_store_imports_sys checks 'import sys' exists) or file-source text scans (TestCheckpointPrintsToStderr) which don't test user behavior. Add tests/test_checkpoint_e2e.py with two real e2e scenarios: - Drive agent.run with a mocked LLM that emits a Write tool_call; assert the checkpoint hook created a pre-edit backup of the original content. - Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip message lands on stderr only, not stdout. This is the actual user-visible contract of PR SafeRL-Lab#47 and covers the full wiring agent.run -> Write hook -> checkpoint.store.track_file_edit. The three behavior tests in test_checkpoint_store.py stay -- they cover the store function directly via capsys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Changes in this update
|
Author
Changes in this updateRemoved: type coercion (split to separate PR #62)Coercion is an independent concern - moved to its own PR with bug fixes. Added: ID uniqueness enforcementPorted id_uniquify.py from bouzecode. Without this, when the LLM reuses IDs like r1 across turns, you get duplicate tool_call_id errors from the API. Fixed: input_schema-style schema injectionScheduling props were injected at the wrong level for Anthropic-style schemas. Now correctly targets input_schema.properties. Tests
|
e6db1fc to
e9a96f7
Compare
… Python versions)
Split _coerce_params (20 lines, nested try/except chain) into:
- a small orchestrator that walks params and delegates,
- four single-purpose coercers (_coerce_int / _coerce_float /
_coerce_bool / _coerce_json) dispatched through a _COERCERS map.
Each catching coercer still returns the original string on failure -- but
the intent is now explicit via a comment ("tool handler reports the real
type mismatch"), and the bare `except: pass` silent-pass pattern is gone.
Also fix test_scheduling_params_stripped which called execute_tool without
the required config arg; it has been failing since the pr4 branch landed.
Add tests/test_tool_scheduling_e2e.py that drives agent.run with a
mocked LLM:
- assert every schema sent to the stream carries tool_call_alias +
depends_on (proof the schema injection path is wired through the full
agent loop, not just a unit helper);
- register a "receiver" tool, let the LLM emit a tool_call with
scheduling params + one real param, assert the scheduling params are
gone and the real param reaches the handler.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e9a96f7 to
805b024
Compare
Author
|
Actually depends on #47 (needs conftest.py) |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expose two scheduling hints to the LLM on every tool schema, and strip them before they reach tool handlers:
tool_call_alias: string- optional alias the model can use to refer to a tool call by a short name later in the same turn.depends_on: string[]- list of priortool_call_ids or aliases. The model uses this to express sequential dependencies between tools that it wants executed one after the other rather than in parallel.Also coerces string-typed params (sent by LLMs that flatten JSON) into their schema-declared types:
"42"?42for anintegerproperty,"true"?Trueforboolean,'[{...}]'? list/dict forarray/object.What's in scope here
This PR only injects the schema fields and performs param coercion + stripping. Runtime enforcement of
depends_onordering in the agent loop is not yet implemented - the model gets a hint and can call tools in order manually, but the registry does not re-order parallel executions based ondepends_on. Deferred to a follow-up PR so this one stays small and review-friendly.Changes
tool_registry.py_SCHEDULING_PROPSconstant,_coerce_paramssplit into per-type coercers dispatched through_COERCERS, wrapper overget_tool_schemasthat injects scheduling props, wrapper overexecute_toolthat strips scheduling props and coerces typestests/test_tool_scheduling.pytests/test_tool_scheduling_e2e.pyagent.run+ mockedstream: (1) every schema the LLM sees carries the scheduling props, (2) when the LLM emits a tool_call that includestool_call_alias+depends_on, those are gone by the time the tool handler runsCleanups folded in
_coerce_params' silentexcept (ValueError, json.JSONDecodeError): passreplaced by a dispatch table where each coercer explicitly returns the original value on failure, with a comment explaining the intent ("tool handler reports the real type mismatch").test_scheduling_params_strippedwhich calledexecute_tool(name, params)without the requiredconfigarg - it was failing since the branch landed.Ref #43